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. \ AUTOMATED METHOD AND SYSTEM FOR ADVANCED NON-PARAMETRIC 

CLASSIFICATION OF MEDICAL IMAGES AND LESIONS 

5 

[001] The present invention was made in part of U.S. Government support under N1H Grant 
ROI CA89452. The U.S. Government may have certain rights to this invention. 

CROSS REFERENCE TO RELATED APPLICATIONS 

10 

[002] The present application is related to and claims the benefit of provisional U.S. Patent 
Application No. 60/429,538, filed on November 29, 2002, the entire contents of which are 
incorporated herein by reference. 

15 BACKGROUND OF THE INVENTION 

Field of the Invention 

[003] The invention relates generally to the field of computer-aided diagnosis (CAD) 

including detection, characterization, diagnosis, and/or assessment of normal and diseased 
20 states (including lesions). 

[004] The present invention also generally relates to computerized techniques for automated 

analysis of digital images, for example, as disclosed in one or more of U.S. Patents 4,839,807; 

4,841,555; 4,851,984; 4,875,165; 4,907,156; 4,918,534; 5,072,384; 5,133,020; 5,150,292; 

5,224,177; 5,289,374; 5,319,549; 5,343,390; 5,359,513; 5,452,367; 5,463,548; 5,491,627; 
25 5,537,485; 5,598,481; 5,622,171; 5,638,458; 5,657,362; 5,666,434; 5,673,332; 5,668,888; 

5,732,697; 5,740,268; 5,790,690; 5,832,103; 5,873,824; 5,881,124; 5,931,780; 5,974,165; 

5,982,915; 5,984,870; 5,987,345; 6,011,862; 6,058,322; 6,067,373; 6,075,878; 6,078,680; 

6,088,473; 6,112,112; 6,138,045; 6,141,437; 6,185,320; 6,205,348; 6,240,201; 6,282,305; 

6,282,307; 6,317,617; as well as U.S. patent applications 08/173,935; 08/398,307 (PCT 
30 Publication WO 96/27846); 08/536,149; 08/900,189; 09/027,468; 09/141,535; 09/471,088; 

09/692,218; 09/716,335; 09/759,333; 09/760,854; 09/773,636; 09/816,217; 09/830,562; 

09/818,831; 09/842,860; 09/860,574; 60/160,790; 60/176,304; 60/329,322; 09/990,311; 

09/990,310; 60/332,005; and 60/331,995; as well as co-pending U.S. patent applications 

(listed by attorney docket number) 215752US-730-730-20, 216439US-730-730-20, and 



references identified in the following List of Non-Patent References by the author(s) and year 
of publication and cross referenced throughout the specification by reference to the respective 



> number, in parentheses, of the reference: 
[005] List of Non-Patent References 
5 1. Feig SA: Decreased breast cancer mortality through mammographic screening: Results 

of clinical trials. Radiology 167:659-665, 1988. 
2. Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grontoft O: Update of the Swedish 
two-county program of mammographic screening for breast cancer. Radiol Clin North 
Am 30:187-210, 1992. 

10 3. Smart CR, Hendrick RE, Rutledge JH, Smith RA: Benefit of mammography screening 

in women ages 40 to 49 years: Current evidence from randomized controlled trials. 
Cancer 75:1619-26, 1995. 
4. Bassett LW, Gold RH: Breast Cancer Detection: Mammography and Other Methods in 
Breast Imaging New York: Grune and Stratton, 1987. 

15 5. KopansDB: Breast Imaging. Philadelphia: JB Lippincott, 1989. 

6. Brown ML, Houn F, Sickles EA, Kessler LG: Screening mammography in community 
practice: positive predictive value of abnormal findings and yield of follow-up 
diagnostic procedures. AJR 165:1373-1377, 1995. 

7. Giger ML: Computer-aided diagnosis. In: Syllabus: A Categorical Course on the 
20 Technical Aspects of Breast Imaging, edited by Haus A, Yaffe M. Oak Brook, IL: 

RSNA Publications, 1993, pp. 272-298. 

8. Vybomy CJ, Giger ML: Computer vision and artificial intelligence in mammography. 
AJR 162:699-708, 1994. 

9. Giger ML, Huo Z, Kupinski MA, Vybomy CJ: "Computer-aided diagnosis in 
25 mammography", In: Handbook of Medical Imaging, Volume 2. Medical Imaging 

Processing and Analysis, (Sonka M, Fitzpatrick MJ, eds) SPIE, pp. 915-1004, 2000. 

10. D'Orsi CJ, Bassett LW, Feig SA, Jackson VP, Kopans DB, Linver MN, Sickles EA, 
Stelling CB: Breast Imaging Reporting and Data System (BI-RADS). Reston, VA 
(American College of Radiology), 1998. 

30 11. Getty DJ, Pickett RM, D f Orsi CJ, Swets JA: Enhanced interpretation of diagnostic images. 

Invest. Radiol. 23: 240-252, 1988. 
12. Swets JA, Getty DJ, Pickett RM, D'Orsi CJ, Seltzer SE, McNeil BJ: Enhancing and 
evaluating diagnostic accuracy. Med Decis Making 1 1:9-18, 1991. 
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13. Cook HM, Fox MD: Application of expert systems to mammographic image analysis. 
American Journal of Physiologic Imaging 4: 16-22, 1989. 
v j 14. Gale AG, Roebuck EJ, Riley P, Worthington BS, et al.: Computer aids to 

mammographic diagnosis. British Journal of Radiology 60: 887-891, 1987. 
5 15. Getty DJ 5 Pickett RM, D'Orsi CJ, Swets JA: Enhanced interpretation of diagnostic 

images. Invest. Radiol. 23: 240-252, 1988. 

16. Swett HA, Miller PA: ICON: A computer-based approach to differential diagnosis in 
radiology. Radiology 163: 555-558, 1987. 

17. Huo Z, Giger ML, Vybomy CJ, Bick U, Lu P, Wolverton DE, Schmidt RA: Analysis of 
10 spiculation in the computerized classification of mammographic masses" Medical 

Physics 22:1569-1579, 1995. 

18. Jiang Y, Nishikawa RM, Wolverton DE, Giger ML, Doi K, Schmidt RA, Vyborny CJ: 
Automated feature analysis and classification of malignant and benign clustered 
microcalcifications. Radiology 198(3):671-678, 1996. 

15 19. Ackerman LV, Gose EE: Breast lesion classification by computer and xeroradiography. 

Breast Cancer 30:1025-1035, 1972. 

20. Patrick EA, Moskowitz M, Mansukhani VT, Gruenstein EI: Expert learning system 
network for diagnosis of breast calcifications. Invest Radiol 16: 534-539, 1991. 

21. Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Schmidt RA, Doi K: Automated 
20 computerized classification of malignant and benign mass lesions on digitized 

mammograms. Academic Radiology 5: 155-168, 1998. 

22. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K: Improving breast 
cancer diagnosis with computer-aided diagnosis. Academic Radiology 6:22-33, 1999. 

23. Huo Z, Giger ML, Metz CE: Effect of dominant features on neural network 
25 performance in the classification of mammographic lesions. PMB 44: 2579-2595, 

1999. 

24. Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Metz CE: Computerized classification 
of benign and malignant masses on digitized mammograms: a robustness study. 
Academic Radiology 7:1077-1084 2000. 

30 25. American Cancer Society. Cancer facts and Figures-1998. New York, NY. 1998; p. 20. 

26. Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21:720-733. 

27. Efromovich, Sam. "Nonparametric curve estimation : methods, theory and applications". 

Springer, New York 1999 
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^ 28. Silverman, B. W. "Density Estimation for Statistics and Data Analysis", Chapman and 
" k " Hall, London, New York, 1986. 

| 29. Zhou KH, Hall WJ, Shapiro DE. "Smooth non-parametric receiver operating 
characteristic (ROC) curves for continuous diagnostic tests". Stat Med., 1997, 
5 16(19):2143-56. 

[006] The following patents and patent applications may be considered relevant to the field 
of the present invention: 

30. Doi K, Chan H-P, Giger ML: Automated systems for the detection of abnormal 
10 anatomic regions in a digital x-ray image. U. S. Pat. No. 4907156, March 1990. 

31. Giger ML, Doi K, Metz CE, Yin F-F: Automated method and system for the detection 
and classification of abnormal lesions and parenchymal distortions in digital medical 
images. U. S. Pat. No. 5133020, July 1992. 

32. Doi K, Matsumoto T, Giger ML, Kano A: Method and system for analysis of false 
15 positives produced by an automated scheme for the detection of lung nodules in 

digital chest radiographs. U.S. Pat. No. 5289374, February 1994. 

33. Nishikawa RM, Giger ML, Doi K: Method for computer-aided detection of clustered 
microcalcifications from digital mammograms. U.S. Pat. No. 5,537,485, July 1996. 

34. Giger ML, Doi K, Lu P, Huo Z: Automated method and system for improved 
20 computerized detection and classification of mass in mammograms. U. S. Pat. No. 

5,832,103, November, 1998. 

35. Giger ML, Bae K, Doi K: Automated method and system for the detection of lesions 
in medical computed tomographic scans. U. S. Pat. No. 5,881,124, March, 1999. 

36. Bick U, Giger ML: Method and system for the detection of lesions in medical images. 
25 U.S.Pat. Allowed. 

37. Giger ML, Zhang M, Lu P: Method and system for the detection of lesions and 
parenchymal distortions in mammograms. U.S. Pat. No. 5,657,362, August, 1997. 

38. Giger ML, Kupinski MA: Automatic analysis of lesions in medical images. U.S. Pat. 
: 6,138,045, Oct. 24, 2000. 

30 39 - Huo Z, Giger ML: Method and system for the computerized assessment of breast 

cancer risk. U.S. Pat. 6,282,305, August 28, 2001. 
40. Giger ML, Al-Hallaq H, Wolverton DE, Bick U: Method and system for the 
automated analysis of lesions in ultrasound images. U.S. Pat. 5,984,870, Nov. 16, 
1999. 
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41. Gilhuijs K, Giger ML, Bick U: Method and system for the automated analysis of 
lesions in magnetic resonance images. U.S. Pat. 08/900,188 allowed. 

42. Gilhuijs K, Giger ML, Bick U: Method and system for the assessment of tumor 
extent. U.S. Pat. 09/156,413, allowed; 

43. Armato SG, Giger ML, MacMahon H: Method, system and computer readable 
medium for the two-dimensional and three-dimensional detection of lesions in 
computed tomography scans. U.S. Pat. Pending; 

44. Giger ML, Vyborny CJ, Huo Z, Lan L: Method, system and computer readable 
medium for an intelligent search workstation for computer assisted interpretation of 
medical images. U.S. Pat. pending, 09/773,636; and 

45. Drukker K, Giger ML, Horsch K, Vyborny CJ: Automated method and system for the 
detection of abnormalities in sonographic images. U.S. Pat. Pending 60/332,005. 

[007] The contents of each of these references, including patents and patent applications, are 
incorporated herein by reference. The techniques disclosed in the patents, patent applications 
and other references can be utilized as part of the present invention. 

Discussion of the Background 

[008] The inventors' research, findings and analysis are discussed in this Background 
section along with that of others; accordingly, discussion in this section does not constitute an 
admission that the discussed material constitutes "prior art." 

[009] Breast cancer remains a disease without a cure unless it is found at a sufficiently early 
stage and subsequently surgically removed, irradiated, or eradicated with chemotherapy. 
Major research issues include those focused on genetic and molecular forms of detection and 
treatment, and those focused on anatomical levels of prevention, detection, and treatment. In 
these various areas, the role of the human interpreter (e.g., oncologist, radiologist, pathologist, 
surgeon, primary care physician) varies. However, the very presence of a human interpreter 
introduces subjective judgment into the decision-making process - whether it be in the initial 
detection (or miss) of a lesion on a mammogram or in the surgical decision regarding the type 
of incision. Thus, while ongoing research is needed in the biological aspects of cancer, in the 
physical aspects of instrumentation to better "see" the cancer, and in the 
biological/chemical/physical aspects of therapy, research is also needed for improving the role 
of the human in the overall management of the patient. Multi-modality and multi-disciplinary 
decision making on patient management, requiring inputs from oncologists, pathologists, 



radiologists, surgeons, and risk clinic physicians, can be quite subjective, as is often evident 
during case management conferences. Although "subjective" does not necessarily mean "poor 
judgement", it does permit sub-optimal and inconsistent decision making. 
[0010] Breast cancer is the leading cause of death for women in developed countries. 
Detection of breast cancer in an early stage increases success of treatment dramatically, and 
hence screening for breast cancer of women over 40 years of age is generally recommended. 
Current methods for detecting and diagnosing breast cancer include mammography, 
sonography (also referred to as ultrasound), and magnetic resonance imaging (MRI). 
[0011] Mammography is the most effective method for the early detection of breast cancer, 
and it has been shown that periodic screening of asymptomatic women does reduce mortality 
(Refs. 1-6). Many breast cancers are detected and referred for surgical biopsy on the basis of 
a radiographically detected mass lesion or cluster of microcalcifications. Although general 
rules for the differentiation between benign and malignant mammographically identified 
breast lesions exist, considerable misclassification of lesions occurs with the current methods. 
On average, less than 30% of masses referred for surgical breast biopsy are actually 
malignant. 

[0012] Computerized analysis schemes are being developed to aid in distinguishing between 
malignant and benign lesions in order to improve both sensitivity (true positive rate) and 
specificity (true negative rate). Comprehensive summaries of investigations in the field of 
mammography CAD (computer aided diagnosis) have been published by Giger and 
colleagues (Refs. 7-9). Investigators have used computers to aid in the decision-making 
process regarding likelihood of malignancy and patient management using human-extracted 
features and BI-RADS (Refs. 10-13). Such methods are dependent on the subjective 
identification and interpretation of the mammographic data by human observers. Gale et al. 
(Ref. 14) and Getty et al. (Ref. 15) both developed computer-based classifiers, which take as 
input diagnostically-relevant features obtained from radiologists* readings of breast images. 
Getty et al. found that with the aid of the classifier, community radiologists performed as well 
as unaided expert mammographers in making benign-malignant decisions. Swett et al. 
(Ref 16) developed an expert system to provide visual and cognitive feedback to the 
radiologist using a critiquing approach combined with an expert system. Other investigators 
have been developing methods based on computer-extracted features (Refs. 17-24). The 
benefit of using computer-extracted features is the objectivity and reproducibility of the 
result. Radiologists employ many radiographic image features, which they seem to extract 
and interpret simultaneously and instantaneously. Thus, the development of methods using 

6 



^ computer-extracted features requires, besides the determination of which individual features 
are clinically significant, the computerized means for the extraction of each such feature. 
m ) Spatial features, which are characteristic of lesions, have been shown to be extractable by a 
computer analysis of the mammograms and to be useful in distinguishing between malignant 
5 and benign. Most methods are evaluated in terms of their ability to distinguish between 

malignant and benign lesions, however, a few have been evaluated in terms of patient 
management (i.e., return to screening vs. biopsy). It is important to state that while one of the 
aims of computerized classification is to increase sensitivity (true positive rate), another aim 
of computerized classification is to reduce the number of benign cases sent for biopsy. Such 

10 a reduction will be clinically acceptable only if it does not result in unbiopsied malignant 

cases, however, since the "cost" of a missed cancer is much greater than misclassification of a 
benign case. Thus, computer classification schemes should be developed to improve 
specificity (true negative rate) but not at the loss of sensitivity (true positive rate). We have 
shown that the computerized analysis of mass lesions (Refs. 17, 21) and clustered 

15 microcalcifications (Refs. 18, 22) on digitized mammograms yields performances similar to 

an expert mammographer and significantly better than average radiologists in the task of 
distinguishing between malignant and benign lesions. 

[0013] We are investigating the potential usefulness of computer-aided diagnosis as an aid to 
radiologists in the characterization and classification of mass lesions in mammography. 

20 Observer studies have shown that such a system can aid in increasing the diagnostic accuracy 

of radiologists both in terms of sensitivity (true positive rate) and specificity (true negative 
rate). Our mass classification method includes three components: 1) automated segmentation 
of mass regions, 2) automated feature-extraction, and 3) automated classification. The 
method was initially trained with 95 mammograms containing masses from 65 patients. 

25 Features related to the margin, shape, and density of each mass are extracted automatically 

from the image data and merged into an estimate of the likelihood of malignancy (Refs. 17, 
21, 23, 24). These features include a spiculation measure (Figure 1), a margin definition 
feature (Figure 2), and two density measures. The round-robin performance of the computer 
in distinguishing between benign and malignant masses was evaluated by receiver operating 

30 characteristic (ROC) analysis (Ref. 21). Our computer classification scheme yielded an Az 

value of 0.94, similar to that of an experienced mammographer (Az=0.91) and statistically 
significantly higher than the average performance of five radiologists with less 
mammographic experience (Az=0.81) (Figure 3). With the database we used, the computer 
scheme achieved, at 100% sensitivity, a positive predictive value of 83%, which was 12% 
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^ higher than that of the experienced mammographer and 21% higher than that of the average 
performance of the less experienced mammographers at a p-value of less than 0.001 
f (Ref. 21). 

[0014] The computerized mass classification method was independently evaluated on a 110- 
5 case database consisting of 50 malignant and 60 benign cases (Ref. 24). The effects of 

variations in both case mix and in film digitization technique on the performance of the 
method were assessed. Categorization of lesions as malignant or benign using the computer 
achieved an Az value (area under the receiver operating characteristic (ROC) curve) of 0.90 
on the prior training database (Fuji scanner digitization) in a round-robin evaluation, and Az 

10 values of 0.82 and 0.81 on the independent database for Konica and Lumisys digitization 

formats, respectively. However, in the statistical comparison of these performances, we 
failed to show a statistical significant difference between the performance on the training 
database and that on the independent validation database (p-values > 0.10). Thus, our 
computer-based method for the classification of lesions on mammograms was shown to be 

15 robust to variations in case mix and film digitization technique (Ref. 24). 

[0015] Subsequently we have developed advanced classifiers for the merging of features - 
characteristics of the lesion or image - into a probability or status of disease. These 
classifiers have potential to aid in the development of CAD methods in a limited database 
scenario. 

20 

SUMMARY OF THE INVENTION 
[0016] Accordingly, an object of this invention is to provide a method and system that 
classifies images using non-parametric classification. 

[0017] Accordingly, an object of this invention is to provide a method and system that 
25 classifies lesions using non-parametric classification. 

[0018] Accordingly, an object of this invention is to provide a method and system that 
classifies disease status using non-parametric classification. 

[0019] Another object of this invention to provide a method and system that perform 
computerized differential diagnosis of medical images using non-parametric classification. 
30 [0020] These and other objects are achieved according to the invention by providing a new 

automated method and system that classifies lesions or medical images in which the analysis 
method involves non-parametric classification. 

[0021] Preferred embodiments of the present invention provide a method and system that 
employ a lesion characterization module. A specific embodiment is a computerized method 
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s for the characterization of mammographic lesions combined with a computerized method for 

the classification of the lesions using non-parametric classification. 
„ ; J [0022] According to other aspects of the present invention, there are provided novel systems 

implementing the methods of this invention, and novel computer program products that upon 
5 execution cause the computer system to perform the method of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0023] A more complete appreciation of the invention and many of the attendant advantages 
thereof will be readily obtained as the same becomes better understood by reference to the 
10 following detailed description when considered in connection with the accompanying 

drawings, in which like reference numerals refer to identical or corresponding parts 
throughout the several views, and in which: 

[0024] Figure 1 is an illustration showing the overall methods for the computerized analysis 
of image data in CAD. These include detection, segmentation, characterization, and 
15 classification; 

[0025] Figure 2 (a) is an illustration defining the radial angle as the angle between the 
direction of the maximum gradient and its radial direction; Figure 2(b) and 2(c) are 
illustrations showing normalized cumulated edge-gradient distributions for spiculated masses; 
and circular masses, respectively; 
20 [0026] Figure 3 shows the relationship between measures of spiculation and margin 

definition for malignant and benign mammographic masses; 

[0027] Figure 4 illustrates results of a test using an embodiment of the present invention; 
[0028] Figure 5 illustrates estimation results for various features; 
[0029] Figure 6 illustrates the effect of varying kernel size in the present invention; 
25 [0030] Figure 7 illustrates results of a test of one embodiment of the present invention; 

[0031] Figure 8 illustrates corresponding test result distribution; and 

[0032] Figure 9 illustrates the effect of kernel size or performance of various embodiments of 
the present invention. 

30 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0033] In describing preferred embodiments of the present invention illustrated in the 
drawings, specific terminology is employed for the sake of clarity. However, the invention is 
not intended to be limited to the specific terminology so selected, and it is to be understood 
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that each specific element includes all technical equivalents that operate in a similar manner 
to accomplish a similar purpose. 

[0034] Figure 1 schematically shows the overall method for computer-aided diagnosis 
indicating the role of non-parametric classification. 

[0035] Classifiers such as linear discriminant analysis or artificial neural networks have 
limitations especially in a limited training database situation. Linear discriminant analysis 
may fail such as in the XOR problem. Artificial neural networks tend to be complex and 
difficult to model. Non-parametric classification can be applied to the various tasks in CAD 
to improve the use of computerized image analysis in medical imaging by optimizing the 
computer output. 

[0036] While the inventors have investigated various computer-extracted features of lesions 
(and their relationship to likelihood of malignancy), it is novel to combine such features using 
non-parametric classifiers in order to improve characterization of the lesion, image, and/or 
disease status, especially when limited databases for training are available. A particular 
example is given here using non-parametric classification in the task of distinguishing 
between malignant and benign mammographic lesions. 

[0037] Radiographically, mass lesions can be characterized (Refs. 7, 9) by, for example: 

■ Lesion Feature 1 : degree of spiculation (spiked versus rounded), 

■ Lesion Feature 2: margin definition (margin sharpness), 

■ Lesion Feature 3: shape, 

■ Lesion Feature 4: density (determined using average gray level, contrast, texture), 

■ Lesion Feature 5: homogeneity (texture), 

■ Lesion Feature 6: asymmetry, 

■ Lesion Feature 7: temporal stability, 

■ and so forth. 

Mass lesions from mammograms may be characterized using the inventors' earlier work 
(Refs. 17, 21, 23, 24) in which a characterization scheme based on the degree of spiculation is 
determined from a cumulative edge gradient histogram analysis in which the gradient is 
analyzed relative to the radial angle (Figure 2). The mass is first extracted from the anatomic 
background of the mammogram using automatic region-growing techniques (Ref. 17). 
Features extracted are then obtained using cumulative edge gradient histogram analysis. In 
the cumulative edge-gradient analysis, the maximum gradient and angle of this gradient 
relative to the radial direction is calculated. 
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[0038] Figure 2 illustrates the calculation of the FWHM (full width at half max) from the 
cumulative gradient orientation histogram for a spiculated mass and a smooth mass. Note 
that here the spiculation feature (based on the radial direction) is used in distinguishing 
between spiculated lesions and round lesions. Also, the average gradient along the margin of 
a mass will be calculated to describe the sharpness of the margin. Higher values indicate a 
sharper margin and thus a higher likelihood that the lesion is benign. 

[0039] In addition, a radial gradient index (normalized radial gradient) (Refs. 21, 69) that 
describes the circularity and density characteristics of a lesion is used and is given by 



£cos(p^D^+D^ 

RGI=^ 



ZV D x+ D y 

PeL 

where: 

RGI is a radial gradient index that is normalized to take on values between -1 and +1, 
P is an image point, 

L is the detected lesion excluding the center part, 
D x is the gradient in the x-direction, 
D y is the gradient in the y-direction, and 

<p is the angle between gradient vector and connection line from center point to 
neighbor point. 



[0040] Although the radiographic density of a mass may not be by itself as powerful a 
predictor in distinguishing between benign and malignant masses as its margin features, taken 
with these features, density assessment can be extremely useful. The evaluation of the 
density of a mass is of particular importance in diagnosing circumscribed, lobulated, 
indistinct, or obscured masses that are not spiculated. 

[0041] In order to assess the density of a mass radiographically, the present invention uses 
three density-related measures (average gray level, contrast, and texture measure) that 
characterize different aspects of the density of a mass. These measures are similar to those 
used intuitively by radiologists. Average gray level is obtained by averaging the gray level 
values of each point within the grown region of a mass. Contrast is the difference between 
the average gray level of the grown mass and the average gray level of the surrounding fatty 
areas (areas with gray-level values in the lower 20% of the histogram for the total surrounding 
area). Texture is defined here as the standard deviation of the average gradient within a mass 



11 



and it is used to quantify patterns arising from veins, trabeculae, and other structures that may 
be visible through a low-density mass, but not through a high-density mass. A mass of low 
radiographic density should have low values of average gray level and contrast, and a high 
value of the texture measure, whereas a mass of high radiographic density should have high 
values of average gray level and contrast, and a low value of the texture measure. 
[0042] Figure 3 shows the relationship between measures of spiculation and margin 
definition for malignant and benign mammographic masses. 

[0043] Non-parametric methods have been used for curve fitting in statistical analysis (Refs. 
27-29). In the present invention however non-parametric classifiers are used to merge 
features (i.e., characteristics of the lesion or image) into a probability or status of disease. 
These classifiers are used to aid in the development of CAD methods in a limited database 
scenario. 

[0044] A signal/noise classifier based on the ratio of density probabilities at the observed 
point produces the maximal area under the ROC curve, being in this sense the "best" 
classifier possible. Such a classifier is created by (1) constructing estimators of the signal and 
noise densities and (2) classifying observations based on the ratio of the estimated probability 
densities. Non-parametric density methods may also be used to estimate probability densities 
of unknown functional forms. Non-parametric estimates are unbiased in the large number 
limit. One embodiment of the invention is the application of the approach outlined above for 
the classification of breast lesions detected on mammography, using a database of breast 
lesions (malignant or benign) which already have been analyzed by a computer system 
yielding computer-extracted lesion features. The non-parametric density estimate is the 
product of 'blurring' the observations (treated as Dirac 'delta' functions) with a suitably 
chosen kernel. A number of blurring kernels are available to construct the probability density 
estimates. Parabolic kernels of fixed size (1-x 2 and (1-x 2 ) 2 , for |x|<l) are optimal in some 
cases. Alternatively, the Gaussian kernel may be used as it produce smooth, unbounded, 
density estimates (closer to our perception of what the "true" probability density should be). 
The kernel may be of fixed size, or it can be adaptative (wider in regions where data are more 
sparse, narrower in regions where data are more dense). In some cases adaptative kernels 
offer faster convergence, but fixed-size kernels are preferable as they are more robust to 
implement. In addition, the size of a fixed kernel can be found based on theoretical criteria. 
[0045] The probability densities in the feature space for benign and malignant lesions in a 
database can be estimated by summing up the blurring kernels centered in the observations, 
thus yielding the likelihood ratios. In the evaluation, lesions from an independent database 
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can be classified based on the ratio of the estimated probability densities. The quality of fit 
will be estimated by the area beneath the corresponding ROC curve. 

[0046] Figure 4 shows an example for implementing non-parametric classification in CAD 
according to the present invention. The examples are given using a training database of 92 
malignant (cancerous) lesion images and 1 10 benign lesion images and an independent testing 
database of 68 malignant lesion images and 38 benign lesion images. 

[0047] The present invention uses a non-parametric method for classifying mammographic 
lesions in order to estimate the probability density function (PDF) of malignant and benign 
lesions in the feature space. The feature space can consist of various features including the 
limited list above that are extracted by the computer to characterize the lesions. The present 
invention uses non-parametric smoothing with a kernel, K, to estimate the PDFs. Finally, a 
ratio of probability densities (i.e., the likelihood ratio) is used to classify the lesions. 
[0048] The PDF Estimator (i.e. the estimate of the PDF) is obtained by the following 

_> _> _> 

PDF (x) = 2 K( x - Xi ) 

Where the kernel K may be paraboloid, Gaussian, Lorentzian or other forms. 
[0049] Figure 5 schematically shows the estimation of the probability density function of a 
given feature. The dot symbols indicate the feature values for seven potential malignant 
lesions. Each region is spread (blurred) using a specific kernel (size and shape) and then 
summed to yield the estimated PDF for that particular feature. Note that the kernel size and 
shape can be made adaptive to the denseness (or inversely to the sparcity) of the feature data 
points. This process is repeated for each feature type for the malignant lesions and for the 
benign lesions. 

[0050] Ultimately one obtains the PDFs for the malignant lesions (PDF majignant ) and for the 
benign lesions (PDF be nign). The estimate of the likelihood ratio is calculated from the 
estimates of PDF malignant and PDF ben ign for all features values in the training database. 

LR (x) = PDF malignant (x) /PDF be nign(x) 

[0051] The LR(xj) is then used to classify lesion j in the testing database, or any unknown 
lesion (or known lesion). 
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[0052] In a test of this invention, each lesion image was characterized by 5 computer- 
extracted features: radial gradient of margin, spiculation, margin sharpness, texture, and 
average gray value. Then, the lesions were classified using combinations of features, two at a 
time using the non-parametric classification method. The kernel was a Gaussian kernel with 
the kernel width for a specific feature being a percentage of the range of the values for that . 
feature over all the lesions. Note that for a given feature, the kernel width was kept fixed in 
determining the PDF. In an alternative embodiment the width could be varied to be, for 
example, larger when less data points are available. This is schematically illustrated in Figure 
6 in which the width of the kernel for the sparser-spaced data is larger. 

[0053] Figure 7 demonstrates for the test performed the 2-dimensional distribution of the two 
features (spiculation and radial gradient along the margin) for malignant and benign lesions in 
the training database (i.e., a consistency result). In this test, a Gaussian kernel size of 10% of 
the feature range was employed. The separation line, indicated by the zero notation, yields an 
area under the ROC curve of 0.86 for the two-feature, non-parametric classifier in the task of 
distinguishing between malignant and benign lesions. 

[0054] Figure 8 demonstrates the corresponding 2-dimensional distribution for the 
independent testing database (i.e., a validation result). The separation line, indicated by the 
zero notation, yields an area under the ROC curve of 0.81 for the two-feature, non-parametric 
classifier in the task of distinguishing between malignant and benign lesions. 
[0055] Figure 9 illustrates the effect of kernel size on the performance of the classifier in the 
task of distinguishing between malignant and benign lesion. Note that the classifier is quite 
robust over a range of kernel sizes. 

[0056] The table below gives performance results for the non-parametric classifier in which 
features were merged two at a time. The method can be extended to merge more than two 
features, as the database increases. Here ROC analysis (Ref. 26) was used to determine the 
performance of the combined features sets in the task of classifying lesions as malignant or 
benign. The validation result is given. 
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Spiculation 


Margin 
sharpness 


Texture 


Average 
gray level 


RadGrad 


0.83 


0.79 


0.73 


0.76 


Spiculation 




0/79 


0.74 


0.78 


Margin 
Sharpness 






0.51 


0.54 


Texture 








0.53 



Table 1: Area under Receiver Operating Characteristic (ROC) •curve 



[0057] It is evident from this testing that use of a non-parametric classifier can contribute to 
the classification of mass lesions by a computer, and likewise, can be expected to improve 
diagnoses. In addition, use of an adaptive kernel size dependent on the sparseness of feature 
data can be expected to improve the classification, especially when a limited database is used 
in training.. 

[0058] Although the method has been presented on mammographic breast image data sets, 
the inventive non-paramatric CAD analysis method can be implemented on other breast 
images (such as sonograms) in which a computerized image analysis is performed with 
respect to some disease state, or it can be implemented on other medical images (such chest 
radiographs or CT scans) with respect to some disease state or state of risk. 
[0059] Numerous modifications and variations of the present invention are possible in light 
of the above teachings. It is therefore to be understood that within the scope of the appended 
claims and their equivalents, the invention may be practiced otherwise than as specifically 
described herein. 
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